pacman::p_load(tidyverse, sf, sfdep, tmap, mapview, plotly, knitr)Take-home Exercise 1: Geospatial Analytics for Public Good
1 Introduction
1.1 Background
Buses and the Mass Rapid Transit (MRT) system are the main modes of transport used by Singaporeans for their daily commutes. With an ever increasing population, the key challenge for public bus operators is balancing supply against demand by optimising its services, fleets, and manpower for different bus routes.
Hence, the identification and analysis of the movement patterns of public buses can provide insights on commuting in Singapore, allowing for the development of better urban management strategies by both the public and private sectors.
1.2 The Study Area
The study area is Singapore, a city-state in Southeast Asia, with a total land area of about 730 square kilometres and a total population of 5.92 million (as at June 2023).
Commuting by public bus is the most common mode of transport in Singapore, with an average daily ridership of about 3.461 million in 2022, according to the LTA. The public bus fleet is approximately 5,800, plying more than 300 routes and 5,000 bus stops.
1.3 The Analytical Questions
In this take-home exercise, I aim to answer the following analytical questions:
Is the demand for public bus transport evenly distributed geographically?
If no, are there signs of spatial clustering?
If yes, where are these clusters, and what kind of clusters are they?
Based on the analysis, what are the insights that can be derived for urban transport planning?
1.4 Objectives and Tasks
Hence, the objectives of this take-home exercise are to:
Apply Exploratory Spatial Data Analysis (ESDA) to obtain a preliminary understanding of public bus movements in Singapore; and
Apply appropriate Local Indicators of Spatial Association (LISA) Analysis or Emerging Hot Spot Analysis (EHSA) to undercover the spatial and/or spatio-temporal mobility patterns of public bus passengers in Singapore.
The detailed tasks are:
Geovisualisation and Analysis:
Compute passenger trips generated by origin at the hexagon level for:
Peak Hour Period Bus Tap On Time Weekday Morning Peak 6am to 9am Weekday Afternoon Peak 5pm to 8pm Weekend/Holiday Morning Peak 11am to 2pm Weekend/Holiday Evening Peak 4pm to 7pm Display the geographical distribution of the passenger trips using appropriate geovisualisation method.
Describe the spatial patterns revealed by the geovisualisation.
EITHER:
Local Indicators of Spatial Association (LISA) Analysis:
Compute LISA of the passengers trips generated by origin at the hexagon level.
Display the LISA maps of the passengers trips generated by origin at the hexagon level - for those that are statistically significant.
Draw statistical conclusions from the analysis results.
OR:
Emerging Hot Spot Analysis (EHSA) with reference to the passenger trips generated by origin at the hexagon level for the four time intervals explored in 1. above:
Perform Mann-Kendall Test using the spatio-temporal local Gi* values.
Prepare EHSA maps of the Gi* values of the passenger trips generated by origin at the hexagon level - for those that are statistically significant.
Describe the spatial patterns revealed with reference to the EHSA maps and data visualisation prepared.
Either 2. or 3. is required to be completed. I have completed 2. for this take-home exercise.
2 Getting Started
2.1 Setting the Analytical Tools
The R packages used in this take-home exercise are:
tidyverse (i.e. readr, tidyr, dplyr) for performing data science tasks such as importing, tidying, and wrangling data;
sf for importing, managing, and processing geospatial data;
sfdep for analysing spatial dependence and spatial relationships in data (building on spdep);
tmap for thematic mapping;
mapview for interactive viewing of spatial objects;
plotly for making interactive plots; and
knitr for embedding R code in different document formats (e.g., HTML) to facilitate dynamic report generation.
They are loaded into the R environment:
2.2 Data Sources
The Land Transport Authority (LTA) studies commuter movements using the data collected from the use of smart cards and the Global Positioning System (GPS) devices on public buses. The LTA DataMall shares some of these data publicly, which helps to speed up the development of practical solutions to enhance reliability and efficiency of the transport system by other stakeholders, such as the private sector and individuals.
The data sets used in this take-home exercise are:
Passenger Volume by Origin Destination Bus Stops from the LTA DataMall for the period of August 2023 to October 2023. There are three aspatial data sets (one for each month) in csv format. It contains data on the number of trips by weekdays and weekends from origin to destination bus stops.
Bus Stop Location from the LTA DataMall. This is a geospatial data set in ESRI shapefile format. It contains a point representation that indicates the position of each bus stop where buses stop to pick up or drop off passengers.
The data sets are placed under two sub-folders:
geospatial (Bus Stop Location), and
aspatial (Passenger Volume by Origin Destination Bus Stops).
These two sub-folders are within the data folder of my Take-home_Ex1 folder.
3 Data Wrangling
3.1 Preparing the Aspatial Data - Passenger Volume
3.1.1 Importing and Merging Data
The three csv files are imported and combined into a tibble data frame, odbus, using functions from the base package as shown in the following codes:
- Generate list of file names of csv files using the
list.files()function in the base package.
Show the code
filenames = list.files(path="data/aspatial/", pattern="*.csv")
filenames[1] "origin_destination_bus_202308.csv" "origin_destination_bus_202309.csv"
[3] "origin_destination_bus_202310.csv"
- Generate path to csv files using the
file.path()function in the base package.
Show the code
path = file.path("data/aspatial", filenames)
path[1] "data/aspatial/origin_destination_bus_202308.csv"
[2] "data/aspatial/origin_destination_bus_202309.csv"
[3] "data/aspatial/origin_destination_bus_202310.csv"
- Merge the three csv files using the
do.call()andlapply()functions in the base package and theread.csv()function from the readr package.
odbus = do.call("rbind", lapply(path, FUN=function(files){ read.csv(files)}))- Check using the
unique()function in the base package to confirm that the data sets for each of the three months have been incorporated. [Note: Although the three data sets have been merged into a single data frame, the observations would be analysed month by month, instead of aggregating all observations.]
Show the code
unique(odbus$YEAR_MONTH)[1] "2023-08" "2023-09" "2023-10"
Once the combined data set has been obtained, the n_distinct() function in the dplyr package and the sum() function in the base package are applied to uncover the following observations regarding odbus:
Overall, there are 17,118,005 rows (observations) and 7 columns. 5,709,512, 5,714,196, and 5,694,297 rows (observations) from August, September, and October 2023 respectively, reflecting a generally stable total ridership over the three-month period.
The “YEAR_MONTH” column has three unique values, reflecting the observations from August, September, and October 2023.
The “DAY_TYPE” column has two unique values, reflecting observations from weekday or weekends/holiday.
The “TIME_PER_HOUR” has 24 unique values, reflecting that the observations are broken down into each of the 24 hours of a day.
The “PT_TYPE” column refers only has the value of “BUS” as the public transport type. Hence, it may be dropped.
The “ORIGIN_PT_CODE” and “DESTINATION_PT_CODE” columns have 5,075 and 5,079 unique values respectively, reflecting the number of bus stops with bus routes passing through them.
The “TOTAL_TRIPS” column contains values that reflect the number of passengers for each unique month, type of day, origin bus stop, destination bus stop. The minimum value is 1, i.e., there are no rows with zero values.
sapply(odbus, function(x) n_distinct(x)) YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE
3 2 24 1
ORIGIN_PT_CODE DESTINATION_PT_CODE TOTAL_TRIPS
5075 5079 4854
Show the code
sum(odbus$YEAR_MONTH == "2023-08")[1] 5709512
Show the code
sum(odbus$YEAR_MONTH == "2023-09")[1] 5714196
Show the code
sum(odbus$YEAR_MONTH == "2023-10")[1] 5694297
3.1.2 Checking for Duplicates and Missing Values
The data sets from the LTA DataMall are expected to be relatively clean. Nevertheless, due diligence checks for duplicates and missing values are still made to confirm the assumptions.
The duplicated() function in the base package is used to check for duplicates in odbus. There are no duplicates in the tibble data frame.
odbus[duplicated(odbus), ][1] YEAR_MONTH DAY_TYPE TIME_PER_HOUR
[4] PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
[7] TOTAL_TRIPS
<0 rows> (or 0-length row.names)
The colSums() function in the base package is used to check for missing values in odbus. There are no missing values in the tibble data frame, odbus.
colSums(is.na(odbus)) YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE
0 0 0 0
ORIGIN_PT_CODE DESTINATION_PT_CODE TOTAL_TRIPS
0 0 0
3.1.3 Filtering for Peak Period Hours and Selecting Columns
The filter() and select() functions in the dplyr package are used to filter for the rows with the “DAY_TYPE” and “TIME_PER_HOUR” that are studied in this take-home exercise (i.e., “TIME_PER_HOUR” = 6-9am, and 5-8pm on weekdays; and 11-2pm, and 4-7pm on weekends), as well as remove the columns are that not required (i.e., “PT_TYPE” and “DESTINATION_PT_CODE”).
odbus = odbus %>%
filter((DAY_TYPE == "WEEKDAY" & TIME_PER_HOUR %in% c(6, 7, 8, 9, 17, 18, 19, 20))|(DAY_TYPE == "WEEKENDS/HOLIDAY" & TIME_PER_HOUR %in% c(11, 12, 13, 14, 16, 17, 18, 19))) %>%
select(-c(PT_TYPE, DESTINATION_PT_CODE))The output is saved in rds file format and imported into the R environment.
Show the code
write_rds(odbus, "data/rds/bus.rds")
odbus = read_rds("data/rds/bus.rds")3.2 Preparing the Geospatial Data - Bus Stop Location
3.2.1 Importing and Transforming the Geospatial Data - Bus Stop Location
The Bus Stop Location shapefile is imported using st_read() in the sf package. The output is a simple feature data frame, busstop, which is in the SVY21 projected coordinates systems. It has 5,161 features and 3 fields, which include the geometry points indicating the bus stop locations.
busstop = st_read(dsn = "data/geospatial",
layer = "BusStop")Reading layer `BusStop' from data source
`C:\jmphosis\ISSS624\Take-home_Ex\Take-home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
glimpse(busstop)Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
The st_crs() function in the sf package is then used to check the coordinate system of the busstop simple feature data frame. The output shows that although it is projected in SVY21, the EPSG is indicated as 9001, which is incorrect given that it should be 3414 instead.
st_crs(busstop)Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
To correct the EPSG, the st_set_crs() function in the sf package is applied. A check to confirm that the projection transformation has been applied is then made using the st_crs() function again.
busstop = st_set_crs(busstop, 3414)
st_crs(busstop)Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
3.2.2 Checking for Duplicates and Missing Values
The data sets from the LTA DataMall are expected to be relatively clean. Nevertheless, due diligence checks for duplicates and missing values are still made to confirm the assumptions.
The duplicated() function in the base package and st_geometry() function in the sf package are used to check for duplicates in busstop. The output returned Bus Stop No. 96319 at row 3265. On closer inspection of the simple feature data frame, rows 3264 and 3265 are found to be duplicates. Hence, the row 3264 is removed using the row_number() function in the dplyr package. The same check is conducted again to confirm that there are no more duplicates.
busstop[duplicated(st_geometry(busstop)), ]Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 42187.23 ymin: 34995.78 xmax: 42187.23 ymax: 34995.78
Projected CRS: SVY21 / Singapore TM
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
3265 96319 NIL YUSEN LOGISTICS POINT (42187.23 34995.78)
busstop = busstop %>%
filter(row_number() != 3264)
busstop[duplicated(st_geometry(busstop)), ]Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
The colSums() function in the base package is used to check for missing values in busstop. There are no missing values in the simple feature data frame, busstop for all columns except “LOC_DESC”. On closer inspection, there are several bus stops without descriptions but they are retained because they have “BUS_STOP_N” values.
colSums(is.na(busstop))BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
0 0 8 0
3.2.3 Selecting Columns and Removing Bus Stops in Malaysia
The select() function in dplyr package is used to drop the “BUS_ROOF_N” column in busstop since it is not required. As there are bus stops situated in Johor Bahru, Malaysia, they are removed using the filter() function in the dplyr package.
busstop = busstop %>%
select(-BUS_ROOF_N) %>%
filter(LOC_DESC != "JOHOR BAHRU CHECKPT" & LOC_DESC != "LARKIN TER")The output is saved in rds file format and imported into the R environment.
Show the code
write_rds(busstop, "data/rds/busstop.rds")
busstop = read_rds("data/rds/busstop.rds")3.3 Performing Relational Join
The simple feature data frame, busstop, and the tibble data frame, odbus, are combined using the left_join() function in the dplyr package, by matching the “BUS_STOP_N” column in busstop with the “ORIGIN_PT_CODE” in odbus. However, as the “BUS_STOP_N” values are character values, they would need to be converted to integer values first before the join, so that values such as “01012” in busstop can be matched to values such as “1012” in odbus.
Upon joining the two data frames, the output is the simple feature data frame (since busstop was placed as the first argument in the left_join() function), bus. The number of rows (observations), based on odbus, was reduced from 7,903,394 to 7,848,515. This is likely because some newer bus stops such as Bus Stop No. 3549 (Shenton Way Stn Exit 3) and Bus Stop No. 96461 (Bef Aviation Pk Rd) were not included in the Bus Stop Location shapefile that was last updated in July 2023. This means that the analysis in this take-home exercise is limited in this aspect.
busstop = busstop %>%
mutate(BUS_STOP_N = as.integer(BUS_STOP_N))
bus = left_join(busstop, odbus,
by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))The output is saved in rds file format and imported into the R environment.
Show the code
write_rds(bus, "data/rds/bus.rds")
bus = read_rds("data/rds/bus.rds")
rm(odbus, busstop)3.4 Creating Spatial Hexagon Grid
Spatial grids are commonly used in spatial analysis. Regularly shaped grids may comprise of equilateral triangles, squares, or hexagons. The hexagon is the most circular-shaped polygon that can tessellate to form an evenly spaced grid, providing a low perimeter-to-area ratio that reduces sampling bias due to edge effects of the grid shape.
A more circular polygon means that points near the border are closer to the centroid. Hexagons are often used when the analysis involves aspects of connectivity or movement paths. Locating neighbours is simpler using a hexagon grid because the contact edge or length is consistent on each side, resulting in equidistant centroids for each neighbour.
A spatial hexagon grid, hgrid, for the study area of Singapore is created using the following codes:
geo = st_make_grid(bus, c(500, 500), what = "polygons", square = FALSE)
hgrid = st_sf(geo) %>%
mutate(grid_id = 1:length(lengths(geo)))4 Exploratory Data Analysis - Computing, Visualising, and Deriving Insights on Passenger Trips by Origin
4.1 Computing Passenger Trips by Origin
The filter() function in the dplyr package is used to obtain subsets of the simple feature data frame, bus for each of the three months and each of the four peak periods.
wdAMAug = bus %>%
filter(YEAR_MONTH == "2023-08") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)Show the code for other subsets
wdPMAug = bus %>%
filter(YEAR_MONTH == "2023-08") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20)
weAMAug = bus %>%
filter(YEAR_MONTH == "2023-08") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14)
wePMAug = bus %>%
filter(YEAR_MONTH == "2023-08") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19)
wdAMSep = bus %>%
filter(YEAR_MONTH == "2023-09") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)
wdPMSep = bus %>%
filter(YEAR_MONTH == "2023-09") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20)
weAMSep = bus %>%
filter(YEAR_MONTH == "2023-09") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14)
wePMSep = bus %>%
filter(YEAR_MONTH == "2023-09") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19)
wdAMOct = bus %>%
filter(YEAR_MONTH == "2023-10") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)
wdPMOct = bus %>%
filter(YEAR_MONTH == "2023-10") %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20)
weAMOct = bus %>%
filter(YEAR_MONTH == "2023-10") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14)
wePMOct = bus %>%
filter(YEAR_MONTH == "2023-10") %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19)The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.
wdAMAug_ct = st_join(hgrid, wdAMAug) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))Show the code for other subsets
wdPMAug_ct = st_join(hgrid, wdPMAug) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
weAMAug_ct = st_join(hgrid, weAMAug) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wePMAug_ct = st_join(hgrid, wePMAug) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wdAMSep_ct = st_join(hgrid, wdAMSep) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wdPMSep_ct = st_join(hgrid, wdPMSep) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
weAMSep_ct = st_join(hgrid, weAMSep) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wePMSep_ct = st_join(hgrid, wePMSep) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wdAMOct_ct = st_join(hgrid, wdAMOct) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wdPMOct_ct = st_join(hgrid, wdPMOct) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
weAMOct_ct = st_join(hgrid, weAMOct) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))
wePMOct_ct = st_join(hgrid, wePMOct) %>%
drop_na() %>%
group_by(grid_id) %>%
summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))4.2 Preliminary Visualising Passenger Trips by Origin Bus Stops
The various subsets can then be plotted using functions in the tmap package. For illustration purposes, three common types of inputs for the “style” argument and custom breaks are tried out below for Aug 2023 weekday morning peak period.
- style = “quantile” - the number of categories is taken from the “n” argument, which has a default value of 5.
Show the code
tmap_mode("plot")
tm_shape(wdAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
style = "quantile",
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023: style = quantile")
- style = “jenks” - this produces a map based on natural breaks.
Show the code
tmap_mode("plot")
tm_shape(wdAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
style = "jenks",
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023: style = jenks")
- style = “equal” - which produces a map with equal intervals for each category.
Show the code
tmap_mode("plot")
tm_shape(wdAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
style = "equal",
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023: style = equal")
- Custom breaks set at fixed intervals of 100,000 passengers.
Show the code
brk = c(1, 100000, 200000, 300000, 400000, 500000, 600000)
tmap_mode("plot")
tm_shape(wdAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023: custom breaks")
4.3 Weekday Morning Peak (6am to 9.59am)
The custom breaks approach for mapping the passenger trips by origin bus stops are used for descriptive analysis.
The plots from the three months for the weekday morning peak period are visualised using functions from the tmap package:
tmap_mode("view")
tm_shape(wdAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023")Show the code
tmap_mode("view")
tm_shape(wdAMSep_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Sep 2023")Show the code
tmap_mode("view")
tm_shape(wdAMOct_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Oct 2023")The passenger trips by origin bus stops for the weekday morning peak period shows that the largest numbers of passenger trips originate from bus interchanges in the heartland areas, namely:
West: Boon Lay Bus Interchange (409,412 passenger trips in Aug 2023; Grid ID: 1251) and Clementi Bus Interchange (202,467 passenger trips in Aug 2023; Grid ID: 2054).
North: Woodlands Bus Interchange (313,192 passenger trips in Aug 2023; Grid ID: 2411).
Central: Ang Mo Kio Bus Interchange (212,340 passenger trips in Aug 2023; Grid ID: 3239).
East: Tampines Bus Interchange (202,366 passenger trips in Aug 2023; Grid ID: 4539).
The only non-bus interchange bus stop is the one in the north-east region at “After Punggol Road” leading to Tampines Expressway (TPE), with 207,583 passenger trips in Aug 2023. This is possibly due to the large number of commuters heading to work at Changi Airport in the east.
The above observation underscores the importance of bus interchanges in the weekday morning peak period as major transportation hubs for commuters to get to work. This is given that they are the first bus stop for a large number of bus routes. This provides useful information for urban planners and transport authorities in that they could examine whether the existing bus interchanges need to be expanded or improved to accommodate such high demands.
4.4 Weekday Afternoon Peak (5pm to 8.59pm)
The plots from the three months for the weekday afternoon peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
tm_shape(wdPMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Aug 2023")Show the code
tmap_mode("view")
tm_shape(wdPMSep_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Sep 2023")Show the code
tmap_mode("view")
tm_shape(wdPMOct_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Oct 2023")The passenger trips by origin bus stops for the weekday afternoon peak period also shows that the largest numbers of passenger trips originate from bus interchanges in the heartland areas, namely:
West: Boon Lay Bus Interchange (554,436 passenger trips in Aug 2023).
North: Woodlands Bus Interchange (474,311 passenger trips in Aug 2023).
Central: Ang Mo Kio Bus Interchange (325,858 passenger trips in Aug 2023).
East: Tampines Bus Interchange (446,207 passenger trips in Aug 2023).
Again, the above observation underscores the importance of bus interchanges in the weekday afternoon peak period as major transportation hubs for commuters to get home from work. This is likely because of commuters returning from work who take buses from these bus interchanges which are beside MRT stations, i.e., the different bus routes are like the spoke of a bicycle wheel, with the MRT station as the hub. This provides useful information for urban planners and transport authorities in that they could examine whether the existing bus interchanges need to be expanded or improved to accommodate such high demands.
The observation could also reflect that these various towns are not just heartland areas where people live but are economic hubs with offices nearby where people work. This indicates that the Government’s efforts to decentralise economic activities away may have some effect.
4.5 Weekend/Holiday Morning Peak (11am to 2.59pm)
The plots from the three months for the weekend/holiday morning peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
tm_shape(weAMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Aug 2023")Show the code
tmap_mode("view")
tm_shape(weAMSep_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Sep 2023")Show the code
tmap_mode("view")
tm_shape(weAMOct_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Oct 2023")The passenger trips by origin bus stops for the weekend/holiday morning peak period shows that the numbers are generally lower. The bus stops (which are interchanges) with the highest numbers of passenger trips are:
West: Boon Lay Bus Interchange (121,029 passenger trips in Sep 2023).
North: Woodlands Bus Interchange (108,572 passenger trips in Sep 2023).
East: Bedok Bus Interchange (111,194 passenger trips in Sep 2023) and Tampines Bus Interchange (107,719 passenger trips in Sep 2023).
This shows that the bus interchanges in the north, east and west play a very important role in serving as origin hubs for commuters who may be heading to the town area (or other areas in Singapore) to enjoy their time off. This also indicates that these towns are likely to have large populations.
The overall lower numbers on the weekend/holiday, even during the morning peak period shows that bus deployment can be kept relatively lean. Again, this is useful information for urban planners and transport authorities to understand and plan for the needs of bus commuters.
4.6 Weekend/Holiday Evening Peak (4pm to 7.59pm)
The plots from the three months for the weekend/holiday evening peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
tm_shape(wePMAug_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Aug 2023")Show the code
tmap_mode("view")
tm_shape(wePMSep_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Sep 2023")Show the code
tmap_mode("view")
tm_shape(wePMOct_ct) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Oct 2023")The passenger trips by origin bus stops for the weekend/holiday evening peak period shows that the numbers are generally lower. The bus stops (which are interchanges) with the highest numbers of passenger trips are:
West: Boon Lay Bus Interchange (165,722 passenger trips in Sep 2023).
North: Woodlands Bus Interchange (136,163 passenger trips in Sep 2023).
Central: Ang Mo Kio Bus Interchange (105,693 passenger trips in Aug 2023).
East: Bedok Bus Interchange (129,365 passenger trips in Sep 2023).and Tampines Bus Interchange (137,526 passenger trips in Aug 2023).
This shows that the various bus interchanges in the play a very important role in serving as origin hubs for commuters who may be heading to the town area (or other areas in Singapore) to enjoy their time off, or heading home after visiting another heartland area in Singapore.
The overall lower numbers on the weekend/holiday, even during the evening peak period shows that bus deployment can be kept relatively lean. Again, this is useful information for urban planners and transport authorities to understand and plan for the needs of bus commuters.
4.7 Comparing Between Different Peak Periods
4.7.1 Weekday Morning versus Afternoon
The plots from the three months for the weekday morning peak period versus weekday afternoon period are visualised using functions from the tmap package.
The filter() function is applied so that only the hexagons with more than 50,000 passenger trips by origin bus stops are shown.
tmap_mode("view")
tmap_arrange(tm_shape(wdAMAug_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Aug 2023"),
tm_shape(wdPMAug_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Aug 2023"),
ncol = 2)tmap_mode("view")
tmap_arrange(tm_shape(wdAMSep_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Sep 2023"),
tm_shape(wdPMSep_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Sep 2023"),
ncol = 2)tmap_mode("view")
tmap_arrange(tm_shape(wdAMOct_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Morning Peak in Oct 2023"),
tm_shape(wdPMAug_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekday Afternoon Peak in Oct 2023"),
ncol = 2)The key difference between the weekday morning versus afternoon peak period is that the latter has busy bus stops in the central region, such as Bras Besah, Bugis, Tanjong Pagar.
Another important difference is that the key bus interchanges highlighted previously for weekday morning and afternoon peak periods generally face higher volumes in the afternoon than morning. This could mean that more commuters head home in that 4-hour afternoon peak period as compared to the morning peak period. This could mean that the morning peak hour commuter crowd is more spread out (i.e., beyond the 4-hour period), possibly due to more flexible work arrangements (e.g., telecommuting from home in the morning and heading into office after the morning peak period).
4.7.2 Weekend/Holiday Morning versus Evening
The plots from the three months for the weekend/holiday morning peak period versus weekend/holiday evening period are visualised using functions from the tmap package:
tmap_mode("view")
tmap_arrange(tm_shape(weAMAug_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Aug 2023"),
tm_shape(wePMAug_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Aug 2023"),
ncol = 2)tmap_mode("view")
tmap_arrange(tm_shape(weAMSep_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Sep 2023"),
tm_shape(wePMSep_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Sep 2023"),
ncol = 2)tmap_mode("view")
tmap_arrange(tm_shape(weAMOct_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Morning Peak in Oct 2023"),
tm_shape(wePMOct_ct %>% filter(TOTAL_TRIPS >= 50000)) +
tm_fill(col = "TOTAL_TRIPS",
palette = "Reds",
breaks = brk,
title = "No. of Passenger Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "Weekend/Holiday Evening Peak in Oct 2023"),
ncol = 2)The key difference between the weekend/holiday morning versus evening peak period is that the latter has busy bus stops in the central region, such as at Orchard and Chinatown, as well as the north-east region, such as Punggol and Sengkang.
Similarly, another important difference is that the key bus interchanges highlighted previously for weekend morning and evening peak periods generally face higher volumes in the evening than morning. This could mean that more commuters head home in that 4-hour evening peak period as compared to the morning peak period. This could mean that the morning peak hour commuter crowd is more spread out (i.e., beyond the 4-hour period).
5 Local Indicators of Spatial Association (LISA) Analysis
Local Indicators of Spatial Association (LISA) are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. The appropriate LISA, including local Moran’s I, are applied to detect cluster(s) and/or outlier(s) for the Bus Passenger Trips in the bus simple feature data frame, month by month.
5.1 Computing Distance-based Weights
Distance-based weights are used in this take-home exercise. Distance-based weights are more appropriate than contiguity spatial weights in this context given that the hexagons are spread out across Singapore, and some hexagons may not have immediate neighbours (i.e., contiguous) due to reasons such as being separated by reservoirs or waterways.
There are three popularly used distance-based spatial weights, they are: fixed distance weights, adaptive distance weights, and inverse distance weights (IDW). For this take-home exercise, a combination of IDW and adaptive distance weights are used. IDW assigns higher weights to neighbours that are closer and lower weights to neighbours that are further away, while adaptive distance weights balances out the number of neighbours for each hexagon regardless of how densely populated its surrounding areas is (i.e., density of nearby bus stops).
The
st_knn()function in the sfdep package is used to identify neighbors based on k (i.e. k = 8 indicates the nearest eight neighbours). The output is a list of neighbours (i.e.nb).The
st_inverse_distance()function in the sfdep package is then used to calculate inverse distance weights of neighbours in thenb.
wdAMAug_wt = wdAMAug_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)Show the code for other subsets
wdPMAug_wt = wdPMAug_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
weAMAug_wt = weAMAug_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wePMAug_wt = wePMAug_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wdAMSep_wt = wdAMSep_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wdPMSep_wt = wdPMSep_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
weAMSep_wt = weAMSep_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wePMSep_wt = wePMSep_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wdAMOct_wt = wdAMOct_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wdPMOct_wt = wdPMOct_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
weAMOct_wt = weAMOct_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)
wePMOct_wt = wePMOct_ct %>%
mutate(nb = st_knn(geo,
k=8),
wt = st_inverse_distance(nb,
geo,
scale = 1,
alpha = 1),
.before = 1)5.2 Computing Local Indicators of Spatial Association (LISA) of Passenger Trips by Origin
The local_moran() function in the sfdep package is used to compute the local Moran’s I value of “TOTAL_TRIPS” at the hexagon level. The output is a tibble data frame, lisa.
The output is a simple feature data frame containing the columns ii, eii, var_ii, z_ii, p_ii, p_ii_sim, and p_folded_sim.
ii: Local moran statistic.
eii: Expectation of local moran statistic; for local_moran_perm(), the permutation sample means.
var_ii: Variance of local moran statistic; for local_moran_perm(), the permutation sample standard deviations.
z_ii: Standard deviation of local moran statistic; for local_moran_perm(), based on permutation of sample means and standard deviations
p_ii: p-value of local moran statistic using pnorm(); for local_moran_perm(), using standard deviation based on permutation of sample means and standard deviations
p_ii_sim: For local_moran_perm(), rank() and punif() of observed statistic rank for [0, 1] p-values using “alternative=”
p_folded_sim: The simulation folded [0, 0.5] range ranked p-value.
skewness: For local_moran_perm(), the output of e1071::skewness() for the permutation samples underlying the standard deviates
kurtosis: For local_moran_perm(), the output of e1071::kurtosis() for the permutation samples underlying the standard deviates.
The
unnest()function in the tidyr package is used to expand a list-column containing data frames into rows and columns.In terms of interpretation:
A positive local Moran’s I value indicates a hexagon close to similar values (High-High or Low-Low);
A negative value indicates a hexagon close to dissimilar values (High-Low or Low-High); and
A value near zero suggests no significant local spatial autocorrelation.
wdAMAug_lisa = wdAMAug_wt %>%
mutate(local_moran = local_moran(TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)Show the code for other subsets
wdAMSep_lisa = wdAMSep_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wdAMOct_lisa = wdAMOct_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wdPMAug_lisa = wdPMAug_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wdPMSep_lisa = wdPMSep_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wdPMOct_lisa = wdPMOct_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
weAMAug_lisa = weAMAug_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
weAMSep_lisa = weAMSep_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
weAMOct_lisa = weAMOct_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wePMAug_lisa = wePMAug_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wePMSep_lisa = wePMSep_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)
wePMOct_lisa = wePMOct_wt %>%
mutate(local_moran = local_moran(
TOTAL_TRIPS, nb, wt, nsim = 99),
.before = 1) %>%
unnest(local_moran)5.3 Creating Local Indicators of Spatial Association (LISA) Maps of Passenger Trips by Origin
The LISA map is a categorical map showing outliers and clusters. There are two types of outliers namely: High-Low and Low-High outliers. Likewise, there are two type of clusters namely: High-High and Low-Low clusters. The LISA map is an interpreted map combining local Moran’s I of geographical areas and their respective p-values.
wdAMAug_lisasig = wdAMAug_lisa %>%
filter(p_ii_sim < 0.05)
wdAMAug_lisamap = tm_shape(wdAMAug_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Morning Peak in Aug 2023")Show the code for other subsets
wdAMSep_lisasig = wdAMSep_lisa %>%
filter(p_ii_sim < 0.05)
wdAMSep_lisamap = tm_shape(wdAMSep_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Morning Peak in Sep 2023")
wdAMOct_lisasig = wdAMOct_lisa %>%
filter(p_ii_sim < 0.05)
wdAMOct_lisamap = tm_shape(wdAMOct_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Morning Peak in Oct 2023")
wdPMAug_lisasig = wdPMAug_lisa %>%
filter(p_ii_sim < 0.05)
wdPMAug_lisamap = tm_shape(wdPMAug_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Afternoon Peak in Aug 2023")
wdPMSep_lisasig = wdPMSep_lisa %>%
filter(p_ii_sim < 0.05)
wdPMSep_lisamap = tm_shape(wdPMSep_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Afternoon Peak in Sep 2023")
wdPMOct_lisasig = wdPMOct_lisa %>%
filter(p_ii_sim < 0.05)
wdPMOct_lisamap = tm_shape(wdPMOct_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekday Afternoon Peak in Oct 2023")
weAMAug_lisasig = weAMAug_lisa %>%
filter(p_ii_sim < 0.05)
weAMAug_lisamap = tm_shape(weAMAug_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Aug 2023")
weAMSep_lisasig = weAMSep_lisa %>%
filter(p_ii_sim < 0.05)
weAMSep_lisamap = tm_shape(weAMSep_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Sep 2023")
weAMOct_lisasig = weAMOct_lisa %>%
filter(p_ii_sim < 0.05)
weAMOct_lisamap = tm_shape(weAMOct_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Oct 2023")
wePMAug_lisasig = wePMAug_lisa %>%
filter(p_ii_sim < 0.05)
wePMAug_lisamap = tm_shape(wePMAug_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Aug 2023")
wePMSep_lisasig = wePMSep_lisa %>%
filter(p_ii_sim < 0.05)
wePMSep_lisamap = tm_shape(wePMSep_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Sep 2023")
wePMOct_lisasig = wePMOct_lisa %>%
filter(p_ii_sim < 0.05)
wePMOct_lisamap = tm_shape(wePMOct_lisasig) +
tm_fill(col = "mean",
title = "Cluster/Outlier",
id = "grid_id",
showNA = FALSE,
alpha = 0.5,
popup.vars = c("Cluster/Outlier: " = "mean")) +
tm_borders(col = "grey40", lwd = 0.7) +
tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Oct 2023")5.4 Weekday Morning Peak (6am to 9.59am)
The LISA plots from the three months for the weekday morning peak period are visualised using functions from the tmap package:
Show the code for other subsets
tmap_mode("view")
wdAMAug_lisamapShow the code
tmap_mode("view")
wdAMSep_lisamapShow the code
tmap_mode("view")
wdAMOct_lisamapThere are hexagons with statistically significant high-high values surrounding the hexagons highlighted previously for the weekday morning peak period - bus interchanges in Boon Lay, Clementi, Woodlands, Ang Mo Kio, and Tampines, as well as the “After Punggol Road” bus stop leading to the TPE. The hexagons with the bus interchanges themselves are not shown to be statistically significant but their surrounding hexagons are found to have high-high values (i.e., they also have high numbers of passenger trips) or low-high values (i.e., they have low numbers of passenger trips and are outliers).
The presence of the hexagons with high-high values highlights the congestion areas in these various towns and provides useful information for urban planners and transport authorities in that they could examine whether the bus stops around these bus interchanges need to be expanded or improved to accommodate such high demands.
Separately, the hexagons with statistically significant low-low values in the west and north regions reflect that these areas are unlikely to have congestion problems. This is because they are industrial areas and not housing estates.
5.5 Weekday Afternoon Peak (5pm to 8.59pm)
The LISA plots from the three months for the weekday afternoon peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
wdPMAug_lisamapShow the code
tmap_mode("view")
wdPMSep_lisamapShow the code
tmap_mode("view")
wdPMOct_lisamapThere are few hexagons with statistically significant high-high values for the weekday afternoon peak period even though several bus interchanges were highlighted to have high numbers of passenger trips. This meant that the congestion issue is not severe in the weekday afternoon peak period.
The hexagons with statistically significant low-low values reflect that the industrial areas in the west and north regions do not face congestion problems in the weekday afternoon peak period.
5.6 Weekend/Holiday Morning Peak (11am to 2.59pm)
The LISA plots from the three months for the weekend/holiday morning peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
weAMAug_lisamapShow the code
tmap_mode("view")
weAMSep_lisamapShow the code
tmap_mode("view")
weAMOct_lisamapThere are hexagons with statistically significant high-high values surrounding the hexagons highlighted previously for the weekend/holiday morning peak period - bus interchanges in Woodlands, Bedok, and Tampines. Interestingly, they are also surrounded by some hexagons with low-high values (i.e., they have low numbers of passenger trips and are outliers).
On the other hand, the Boon Lay bus interchange is surrounded by hexagons with statistically significant low-high values, showing that they have low numbers of passenger trips and are outliers. This could also mean that there are no other bus stops in the area that is sharing the load of passenger trips with the bus interchange.
Overall, the presence of the hexagons with high-high values and low-high values highlights the congested and non-congested areas in these various towns and provides useful information for urban planners and transport authorities in that they could examine whether the bus stops around these bus interchanges need to be expanded or improved to accommodate such high demands, and divert human traffic away from the bus interchanges during the weekend/holiday morning peak period.
Separately, the hexagons with statistically significant low-low values in the west, north, and east regions reflect that these areas are unlikely to have congestion problems during the weekend/holiday morning peak period. This is because they are industrial areas and not housing estates.
5.7 Weekend/Holiday Evening Peak (4pm to 7.59pm)
The LISA plots from the three months for the weekend/holiday evening peak period are visualised using functions from the tmap package:
Show the code
tmap_mode("view")
wePMAug_lisamapShow the code
tmap_mode("view")
wePMSep_lisamapShow the code
tmap_mode("view")
wePMOct_lisamapThere are hexagons with statistically significant high-high values in the central region at Bras Basah, Bugis, and Toa Payon for the weekend/holiday evening peak period. This highlights the areas that urban planners and transport authorities can focus on in terms of deploying sufficient buses during the peak period to reduce congestion.
The bus interchanges (Boon Lay, Woodlands, Ang Mo Kio, Bedok, and Tampines) that were highlighted previously for the weekend/holiday evening peak period are mostly surrounded by hexagons with statistically significant low-high values, showing that they have low numbers of passenger trips and are outliers. This could also mean that there are no other bus stops in the area that is sharing the load of passenger trips with the bus interchanges.
Separately, the hexagons with statistically significant low-low values in the west, and north regions reflect that these areas are unlikely to have congestion problems during the weekend/holiday evening peak period. This is because they are industrial areas and not housing estates.
6 Conclusion
The take-home exercise has highlighted the bus commuting patterns of Singapore commuters for the four different peak periods, based on number of passenger trips from the origin bus stops. Overall, the LISA maps have highlighted the key problem areas for urban planners and transport authorities to focus their attention on. The analyses in this take-home exercise is limited in that it does not analyse MRT commuter patterns (which is another major public transport mode) alongside bus commuter patterns given that commuters often take a combination of these two transport modes in a single journey. Hence, some of the observations made throughout the take-home exercise cannot be ascertained more accurately and the statistically significant clusters highlighted should be further studied.
~~~ End of Take-home Exercise 1 ~~~